Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Free hash table after grouping set/row number spill to release memory plus a hash table fix #11180

Closed
wants to merge 1 commit into from

Conversation

xiaoxmeng
Copy link
Contributor

Summary:
Found in shadow testing that hash aggregation can use non-trivial amount of memory like a couple hundred MB
after reclaim because the hash table held by grouping set. Currently we only clear the hash table in grouping set
but not free the table inside (only free groups). Similar for row number operator.

This PR change includes
(1) free table after spill for both row number and grouping set to make memory reclamation or arbitration
efficient and see significant improvement in global arbitration shadow testing.
(2) free row number result vector in row number spill to have more strict test check and we assume a single
vector is small and just free 1MB per operator in real workload.
(3) fix free table in hash table which doesn't reset capacity and add unit test to cover

Differential Revision: D63964822

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 7, 2024
Copy link

netlify bot commented Oct 7, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 6d15c10
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/67043c6cb2f63d00082b9317

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D63964822

Copy link
Contributor

@tanjialiang tanjialiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the catch, left some nits

@@ -742,7 +742,7 @@ bool GroupingSet::getOutput(
: 0;
if (numGroups == 0) {
if (table_ != nullptr) {
table_->clear();
table_->clear(/*freeTable=*/true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also check if HashBuild needs this change (putting true to clear() method)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is more intuitive to have HashTable::clear() take true as default instead of false.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is used by partial aggregation. And hash build doesn't need it as we haven't build table until the final stage and probe side always clear the entire table.

xiaoxmeng added a commit to xiaoxmeng/velox that referenced this pull request Oct 7, 2024
… plus a hash table fix (facebookincubator#11180)

Summary:

Found in shadow testing that hash aggregation can use non-trivial amount of memory like a couple hundred MB
after reclaim because the hash table held by grouping set. Currently we only clear the hash table in grouping set
but not free the table inside (only free groups). Similar for row number operator.

This PR change includes
(1) free table after spill for both row number and grouping set to make memory reclamation or arbitration
efficient and see significant improvement in global arbitration shadow testing.
(2) free row number result vector in row number spill to have more strict test check and we assume a single
vector is small and just free 1MB per operator in real workload.
(3) fix free table in hash table which doesn't reset capacity and add unit test to cover

Reviewed By: oerling

Differential Revision: D63964822
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D63964822

… plus a hash table fix (facebookincubator#11180)

Summary:

Found in shadow testing that hash aggregation can use non-trivial amount of memory like a couple hundred MB
after reclaim because the hash table held by grouping set. Currently we only clear the hash table in grouping set
but not free the table inside (only free groups). Similar for row number operator.

This PR change includes
(1) free table after spill for both row number and grouping set to make memory reclamation or arbitration
efficient and see significant improvement in global arbitration shadow testing.
(2) free row number result vector in row number spill to have more strict test check and we assume a single
vector is small and just free 1MB per operator in real workload.
(3) fix free table in hash table which doesn't reset capacity and add unit test to cover

Reviewed By: oerling

Differential Revision: D63964822
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D63964822

void resetTable();
/// all the inputs. If 'freeTable' is false, then hash table itself is not
/// freed but only table content.
void resetTable(bool freeTable = false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: since there very few instances of its use, would it make sense to get rid of the default value?, so that the caller makes an explicit decision and future uses do not inadvertently skip freeing the table if required.

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in e2231c5.

@xiaoxmeng xiaoxmeng deleted the export-D63964822 branch October 8, 2024 05:02
Copy link

Conbench analyzed the 1 benchmark run on commit e2231c57.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants